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The efficient production of biofuels from cellulosic feedstocks will require the efficient fermentation 
of the sugars in hydrolyzed plant material. Unfortunately, plant hydrolysates also contain many 
compounds that inhibit microbial growth and fermentation. We used DNA-barcoded mutant 
libraries to identify genes that are important for hydrolysate tolerance in both Zymomonas mobilis 
(44 genes) and Saccharomyces cerevisiae (99 genes) . Overexpression of a Z. mobilis tolerance gene of 
unknown function {ZM01875) improved its specific ethanol productivity 2.4-fold in the presence of 
miscanthus hydrolysate. However, a mixture of 37 hydrolysate-derived inhibitors was not sufficient 
to explain the fitness profile of plant hydrolysate. To deconstruct the fitness profile of hydrolysate, 
we profiled the 37 inhibitors against a library of Z. mobilis mutants and we modeled fitness in 
hydrolysate as a mixture of fitness in its components. By examining outliers in this model, 
we identified methylglyoxal as a previously unknown component of hydrolysate. Our work provides 
a general strategy to dissect how microbes respond to a complex chemical stress and should enable 
further engineering of hydrolysate tolerance. 
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Introduction 

Concerns over energy security, global warming, and rising 
petroleum prices have led to a renewed interest in the 
development of technologies for cost-effective production of 
ethanol or other biofuels from renewable resources (Hill 
et aU 2006; US Department of Energy, 2011). Lignocellulosic 
biomass, such as wood and grasses, can provide a sufficient 
quantity of feedstock material that can be converted into 
biofuels, and a variety of energy crops are currently being 
considered for use in the United States, such as Miscanthus 
giganteus (miscanthus) and Panicum virgatum (switchgrass) 
(Somerville et aU 2010; Youngs and Somerville, 2012). The 
grand challenge is to produce large quantities of a commodity 
chemical given economic constraints. In the case of cellulosic 
ethanol, technoeconomic analysis has been used to determine 
which steps of the production process, if optimized, can 
have the greatest impact on the minimum selling price 
(Klein-Marcuschamer et aZ, 2010, 2011; Kumar and Murthy, 
2011; Tao et al 2011; Vicari et al 2012). Two critical steps are 
the conversion of biomass into fermentable hexose and 
pentose sugars (mainly glucose and xylose) and their 
subsequent fermentation into ethanol. 



Conversion of biomass into sugars is typically a two-step 
process involving pretreatment and enzymatic hydrolysis. 
Although many variations exist, we have focused on the 
dilute-acid hydrolysis method, which is considered as a viable 
option for commercial-scale cellulosic ethanol production 
(Wyman et aU 2005). The high temperatures and pressures 
typically used for dilute-acid pretreatment result in the 
co-production of inhibitory compounds (derived from sugar 
and lignin degradation) in addition to fermentable sugars. This 
resulting mixture, or plant hydrolysate, contains at least 60 
inhibitory compounds (Clark and Mackie, 1984; Palmqvist and 
Hahn-Hagerdal, 2000a, b) . These inhibitors have a significant 
impact on growth and fermentation of bacteria and yeast, thus 
preventing efficient biofuel production. There are three 
common approaches to deal with fermentation inhibitors: 
prevent their formation by reducing the severity of pretreat- 
ment, remove inhibitors by detoxification, or improve the 
tolerance of the host organism (Palmqvist and Hahn-Hagerdal, 
2000a; Nilvebrant et al 2001 ; Martin et al 2007 a; Parawira and 
Tekere, 2011 ; Stoutenburg et al 2011) . All three methods can be 
successful and are expected to have a significant economic 
impact (Klein-Marcuschamer et al 2010; Tao et al 2011), but 
here we will focus on improving tolerance. 
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There are two complementary approaches to develop more 
tolerant strains: laboratory evolution in the presence of plant 
hydrolysates and rational strain engineering (Larsson et aU 
2001; Heer and Sauer, 2008; Chen, 2010; Yang et al 2010a, b; 
Liu, 2011; Agrawal etaU 2012; Fujitomi etaU 2012). Regardless 
of the approach, the resulting strains are often not fully 
tolerant, and further improvements in ethanol yield and 
productivity are needed (Keller et aU 1998; Martin et aU 
2007b; Heer and Sauer, 2008). Many of these studies have 
focused on single inhibitory compounds present in hydro- 
lysate or simple model mixtures as a proxy for plant 
hydrolysates (Palmqvist et al, 1999; Liu, 2011). Given the 
complexity of inhibitors in actual hydrolysate, it is challenging 
to predict and engineer the necessary genetic changes for 
tolerance based on these previous studies. 

Given this challenge, we present here an experimental and 
computational approach to dissect the effects of complex 
chemical mixtures, such as plant hydrolysates, on the growth 
and fermentation of the bacterium Zymomonas mohilis and 
the yeast Saccharomyces cerevisiae. Both microbes are being 
considered for commercial-scale cellulosic ethanol produc- 
tion. Z. mohilis is currently being used as part of a DuPont 
industrial-scale cellulosic ethanol process (http://www.ddce. 
com/), and S. cerevisiae has a long history of industrial-scale 
ethanol production from corn in the United States and from 
sugarcane in Brazil (Wheals et aU 1999). 

First, we used a functional genomics approach, based on 
chemogenomic profiling of mutant libraries in Z. mohilis and 
S. cerevisiae, to identify genes that are important for growth in 
plant hydrolysates. Chemogenomic profiling with DNA 
barcodes was first pioneered in S. cerevisiae and we have 
recently adapted the technology for use in bacteria (Giaever 
et al 2002, 2004; Oh et aU 2010; Deutschbauer et aU 2011). 
These previous studies, and other technologies for profiling 
large mutant pools, have led to key insights into the function of 
unknown genes and the mechanism of action of inhibitory 
compounds (Sassetti etaU 2001; Giaever etaU 2004; Langridge 
et al 2009; Smith et al 2009; Van Opijnen et al 2009; 
Hillenmeyer et al 2010; Deutschbauer et al 2011) . The putative 
hydrolysate tolerance genes that we identified are attractive 
targets for strain improvement programs, and we demonstrate 
here, by systematic overexpression of Z. mohilis tolerance 
genes, that we can rationally engineer improved fermentation 
in miscanthus hydrolysate. Most of our fitness experiments 
were carried out under aerobic growth conditions because this 
was most compatible with our high-throughput protocols. 
However, we did perform some experiments under anaerobic 
growth conditions to better match the environment of 
an industrial bioreactor, and identified seven additional 
Z. mohilis tolerance genes (see Discussion) . 

Second, we deconstructed the complex biological response 
to plant hydrolysates by obtaining individual chemogenomic 
profiles for each of 37 hydro lysate-derived inhibitors and in 
synthetic mixtures of known components. We modeled fitness 
in hydrolysate as a mixture of fitness in individual components 
and identified outhers in our model that led to the discovery of 
a previously unknown chemical component, methylglyoxal, 
which is present in our miscanthus plant hydrolysate and 
contributes to overall toxicity. In sum, our combined experi- 
mental and computational approach provides a general strategy 
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for understanding how microbes respond to a complex chemical 
stress, for identifying critical unknown chemical components in 
plant hydrolysates, and for rationally engineering strains with 
improved hydrolysate tolerance and fermentation properties. 

Results 

Plant hydrolysate composition and effects on 
Z. mobilis growth and fermentation 

To explore the natural diversity of hydrolysate composition 
and to determine the effect of feedstock-derived inhibitors on 
the growth and fermentation of Z. mohilis, we developed a 
microwave oven-based protocol to hydrolyze M. giganteus 
(miscanthus) and P. virgatum (switchgrass) plant material at 
high temperature and pressure. The procedure was designed to 
mimic, at the laboratory scale, a dilute-acid hydrolysis 
pretreatment method that can be used for the industrial-scale 
production of cellulosic biofuels (Tao et al 2011) . We prepared 
six hydrolysate samples from miscanthus or switchgrass 
grown at different field sites in Illinois and two additional 
hydrolysate samples from a mixture of miscanthus, from 
multiple field sites. All eight hydrolysate samples were 
analyzed using a combination of GC/MS and LC-RID/DAD; 
the composition of these mixtures, including 4 sugars 
(glucose, xylose, arabinose, and cellobiose) and 37 potential 
inhibitors, is shown in Supplementary Table 1. As expected, 
the harsh hydrolysis conditions (low pH and high tempera- 
ture) resulted in the production of sugar dehydration products, 
such as furfural and 5-hydroxymethylfurfural (5-HMF) 
and their degradation products, formic acid and levulinic 
acid, respectively. In addition, we detected a wide variety of 
phenohc compounds derived from hgnin degradation 
(Palmqvist and Hahn-Hagerdal, 2000b; Klinke et al 2004). 
Based on clustering of the inhibitor and sugar concentrations, 
we identified three groups of samples with distinct chemical 
compositions: miscanthus, switchgrass, and the batch mis- 
canthus samples (Supplementary Figure 1). Despite the 
different field locations, miscanthus hydrolysates were more 
similar to each other than switchgrass, and vice versa. By 
contrast, the two batch miscanthus samples (batch 1 and 
batch 2) formed a third independent cluster, and this is likely 
explained by the higher temperature used for their processing 
(200°C versus 180°C) that led to higher concentrations of 
inhibitors (Supplementary Table 1). 

To understand the inhibitory effect of hydrolysate, we first 
determined the concentration of hydrolysate that, when added 
to rich media, significantly inhibited the growth of Z. mohilis. 
Consistent with their similar, but not identical, chemical 
compositions, all eight hydrolysate samples inhibited growth 
of Z. mohilis at concentrations ranging from about 8 to 20% 
(v/v) (Supplementary Figure 2) . The most potent hydrolysate 
samples were the batch miscanthus that was prepared at a 
higher temperature, consistent with their higher inhibitor 
concentrations (Supplementary Table 1). We also determined 
the effects of plant hydrolysate on the fermentation profile of 
Z. mohilis by carrying out small-scale aerobic batch 
fermentations in the absence (Figure lA) or presence of 8% 
(v/v) batch 2 hydrolysate (Figure IB). In the presence of 
hydrolysate, specific ethanol productivity was reduced about 
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Figure 1 Miscanthus hydrolysate inhibits Z mobilis growth and ethanol 
production. Batch fermentation profiles for wild-type Z. moMs strain carrying an 
empty control plasmid (WT + pJS71) either in (A) rich media (RM) or in (B) rich 
media supplemented with 8% (v/v) batch 2 miscanthus hydrolysate (HZ). Data shown 
are the average of four replicates and error bars indicate standard deviation. 

three-fold (0.10 versus 0.27g/l/h/OD5oo}. Taken together, our 
hydrolysate preparations provide a complex mixture of 
inhibitors that serve as a model for industrial-scale dilute-acid 
hydroly sates. 



Generating a genome-wide Z. mobilis barcoded 
transposon library 

To understand the genetic basis of hydrolysate tolerance, we 
first mapped 14 008 random DNA-barcoded transposon inser- 
tions in Z. mohilis ZM4 using protocols recently developed in 
our laboratory for Shewanella oneidensis MR-1 (Deutschbauer 
et aU 2011). From our collection of 14 008 mutants, we derived 
two Z. mobilis mutant pools of 3716 barcoded strains each that 
together represent 1620 of the 1892 (86%) annotated protein- 
coding genes (Materials and methods; Supplementary 
Table 2). We designed two mutant pools because it provided 
optimal genome-wide coverage given a limited set of barcodes 
(about 4200), and it allowed us to include some strains in both 
pools and measure them twice, which provided an internal 
control for experimental consistency. These mutant pools were 
used to perform competitive growth assays, or pooled fitness 
experiments, where the relative abundance of strains in our 
two pools was quantified (Giaever et aU 2002; Pierce et aU 
2006; Oh et al, 2010). Before all experiments, we recovered 
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the mutant pools from frozen stocks and used these cells as 
the starting inoculum for setting up fitness experiments in 
experimental conditions. By comparing strain abundance after 
growth in the experimental condition (END) to strain 
abundance in the starting inoculum (START), we calculated 
a log2 ratio, or strain fitness value for each mutant strain in that 
condition. We define 'gene fitness' as the average strain fitness 
value for insertions within that gene (see Materials and 
methods for details) . Negative gene fitness values indicate that 
a gene is important for growth in the condition of interest, that 
is, transposon mutants of that gene should grow poorly in that 
condition. In contrast, positive gene fitness values indicate 
that the gene is detrimental to growth in the condition of 
interest, that is, transposon mutants of that gene have 
improved growth relative to the typical strain. After final data 
filtering, we obtained gene fitness data for 1578 of 1892 
(83%) protein-coding genes. Because many genes had 
multiple transposon insertions (1336/1578 had >1 insertion), 
and because the same strain was sometimes present in both 
pools, we were able to make an average of 3.5 fitness 
measurements per gene. 

We were surprised to find transposon insertions within the 
central 5-80% of most genes regardless of whether they were 
expected to be essential. In fact, we mapped insertions in 82 % 
of predicted essential genes (Supplementary Table 3), which is 
not significantly less than the rate of 85% for other genes 
(P = 0.10, Fisher's exact test). To our knowledge, this is the first 
example of a large-scale transposon mutagenesis study in 
bacteria with this unusual distribution of insertion sites. 
To study this further, we first examined six mutants by PGR 
using primers that flanked each transposon site. Three of these 
mutants were in predicted essential genes [leu.S::TN5, 
ftsZ::TN5, and rpoB::TN5), and three mutants were in 
non-essential genes (ZMO0759::TN5, ZMO1490::TN5, and 
ZM01723::TN5). In the predicted essential gene mutants, we 
amphfied two bands that correspond to the wild-type and 
mutant copies of leuS, ftsZ, and rpoB, respectively 
(Supplementary Figure 4A and B). By contrast, transposon 
insertions in non-essential genes only had a single band by 
PGR analysis, corresponding to the mutant copy. Mutants with 
two bands by PGR were classified as 'mixed' and were 
examined further by using a transposon stability assay and 
comparative genome hybridization (Materials and methods; 
Supplementary Figure 4G-L) . Based on these experiments, we 
concluded that Z. mobilis is polyploid (i.e., has multiple copies 
of its main 2Mbp chromosome), and that insertions in 
essential genes are heterozygous and unstable in the absence 
of kanamycin selection, whereas insertions in non-essential 
genes are homozygous and stable (Supplementary Note 1; 
Supplementary Figure 5). Alternatively, polyploidy might be a 
rare event that is selected for only when an essential gene is 
mutated; however, because the rate of insertion in essential 
genes is similar to non-essential genes, we beheve that 
polyploidy is the normal state for Z. mobilis. In this study, 
we focused our single-mutant follow-up experiments on 
stable, homozygous Kan^ mutants. Although unusual, other 
bacteria, such as Deinococcus radiodnrans, Thermus 
thermophilus, Synechococcus spp., Sinorhizobium meliloti 
bacteroids, and Epulopiscium sp. type B are known to be 
polyploid and to have multiple copies of their chromosome 
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(Masters et al 1991; Mergaert et al 2006; Ohtani et al 2010; 
Griese et al 2011; Angert, 2012). 

We validated our pooled fitness assay by growing the mutant 
pools in minimal media where we had a strong prediction of 
which genes should have fitness defects. Most of the 53 
annotated amino -acid synthesis genes had strong fitness defects 
in minimal media and were rescued by the addition of casamino 
acids (CAA) (Supplementary Figure 6A-C; Supplementary 
Table 4} . We were unable to rescue mutants of genes required 
for tryptophan biosynthesis {aroABCDE and trpDE), which 
can be explained because CAA does not contain tryptophan; it is 
lost during the preparation of CAA, which involves acid 
hydrolysis of casein. As a second test of auxotroph rescue, we 
grew the Z. mohilis transposon pools in the presence of 
methionine, and this specifically rescued the fitness defect of 
six genes [metCEFWXZ] all predicted to be required for 
methionine biosynthesis (Supplementary Figure 6C}. We then 
carried out single-mutant follow-up studies on a metC::TN5 
mutant to confirm that fitness defects in the pooled assay can 
be recapitulated at the single strain level (Supplementary 
Figure 6D}. As expected, growth of metC::TN5 on minimal 
media could be rescued by the addition of methionine. 

In addition to our biological tests, we calculated two metrics 
(strain and operon correlation), as previously described 
(Deutschbauer et al 2011) that were used to measure overall 
experiment quality and to flag potential experimental errors 
(Supplementary Figure 6E-G) . In the typical experiment, the 
correlation of the 1057 identical strains contained in both pools 
was 0.89 and the correlation of fitness between adjacent genes 
in the same operon was 0.50. Experiments with poor quality 
metrics (operon correlation < 0.4 or strain correlation <0.75) 
were repeated, and kept if reproducible across multiple 
biological rephcates. In sum, our vahdation confirms that 
our Z. mobilis pooled fitness assay provides accurate and 
biologically meaningful results that could be used to study 
hydrolysate tolerance. 



Genome-wide fitness profiling of Z. mobilis 
mutants in plant hydrolysates and 37 chemical 
components 

Using high-throughput culturing methods we recently 
developed for Shewanella oneidensis MR-1 (Deutschbauer 
et al 2011), we performed 189 whole-genome mutant fitness 
experiments with our two Z. mobilis pools. These 189 
experiments represent 58 unique experimental conditions 
including miscanthus and switchgrass hydrolysates, 2 types of 
synthetic hydrolysate mixtures (SYN-37 and SYN-10), 37 
individual compounds that are present in miscanthus hydro- 
lysate, and 11 other stress conditions (Supplementary Table 5) . 
Most fitness experiments were performed in rich media 
without inhibitors and in rich media supplemented with 
hydrolysate or specific compounds of interest. We tested each 
potential growth inhibitor at several different concentrations 
to identify the suitable concentration for mutant fitness 
experiments. In general, we selected the concentration that 
caused about a two-fold increase in doubling time and gave a 
fitness pattern that was reproducibly different from the 
baseline rich media condition (see Materials and methods). 
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An overview of our fitness data set is shown as a clustered 
heat map of average gene fitness values (Figure 2). Gene 
fitness values were averaged across rephcate conditions, such 
as rich media (24 replicates) or identical concentrations of the 
same inhibitor. We also averaged the gene fitness values 
for 37 hydrolysate experiments (Supplementary Figure 7) 
that included both miscanthus and switchgrass samples 
(from single field sites or batch material). The complete, 
non-averaged data set can be found in Supplementary Dataset 1 
and online (http://genomics.lbl.gov/supplemental/zm4hy- 
drolysate/) . 

Two-dimensional hierarchical clustering of the averaged 
gene fitness data revealed two broad categories. One large 
group represents genes that when mutated have little 
or no fitness defect in the 58 conditions we tested (left half. 
Figure 2) . Additional experimental conditions might uncover 
phenotypes for these mutants. The second large group had 
fitness defects in nearly all 58 conditions, including rich media 
(right half. Figure 2) . With a fitness value of — 1 or less, we 
identified 402 genes that were important for growth in rich 
media and 1184 that are not. Of these 402 genes, 185 (46%) are 
expected to be essential (see Materials and methods), while 
just 5% of the other 1184 genes are expected essentials 
(P<10~^^, Fisher's exact test). Because this large cluster of 
genes had negative fitness values in our baseline condition 
(rich media), and many are likely to be essential, they were 
not pursued further in this study. In addition, we found that 
mutants in expected essential genes were more likely to have a 
low abundance in our starting pool (Supplementary 
Figure 3B), making it more difficult to perform accurate fitness 
measurements on these strains. 

Clustering the fitness data by condition revealed a few large 
groups of similar conditions, such as organic acids (Y -axis, 
near bottom. Figure 2). In many cases, these broad groups 
included chemicals of different classes, and it was not clear 
why they formed a cluster. However, within these broad 
groups, we found that chemicals with highly similar structures 
clustered closely (Clusters 1-10 on Y axis. Figure 2). In 10 
cases, these closely related chemicals differed by a single 
functional group, such as a single hydroxyl, ester, methyl, or 
C = C group (Supplementary Figure 8) . For example, cluster 1 
on the Y axis contains two chemicals (2,5-dihydroxybenzoic 
acid and 3-hydroxybenzoic acid) that differ by a single 
hydroxyl group. In sum, chemicals with closely related 
structures had very similar fitness profiles. Our results are 
consistent with previous chemogenomic studies and 
likely reflect the underlying similarity in mechanism of 
action for structurally related inhibitory compounds 
(Hillenmeyer et al 2010). 



Identification of 44 putative hydrolysate tolerance 
genes in Z. mobiiis 

To identify genes that are important for growth in hydrolysate, 
we searched for mutants that showed a significant difference 
in fitness between rich media supplemented with hydrolysate 
and plain rich media. Based on this criterion (fitnesShydroiysate 
< - 1 and fitnesShydroiysate <fitnesSrich - 1), we identified 44 
putative hydrolysate tolerance genes that were further grouped 
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Figure 2 Genome-wide fitness profiling of Z mobilis in 58 experimental conditions, including plant hydrolysate and 37 individual components of hydrolysate. 
Average gene fitness data are represented as a two-dimensional heat map for 1586 genes (X axis) and 58 experimental conditions (Y axis). For each transposon mutant 
in our pool, strain fitness is calculated as log2 ratio of (END/START). Gene fitness values are the average of per-strain fitness across all insertions within that gene and 
are displayed according to the color bar at the top right of the heat map. In addition, gene fitness values have also been averaged across replicate experimental 
conditions. Chemicals with similar structures cluster together on the Y axis (labeled 1-1 0) and differ by a single functional group, as colored by the key at the top left. For 
example, compounds 2,5-dihydroxybenzoic acid and 3-hydroxybenzoic acid (cluster 1) differ by a single hydroxyl group (see Supplementary Figure 8 for more 
examples). The fitness data were clustered in both dimensions by hierarchical agglomerative clustering with complete linkage. Euclidean distance was used as the 
distance metric for genes and Pearson's correlation was used as the similarity metric for experimental conditions. Hydrolysate components are indicated by red text. 



into 7 functional categories (Figure 3A; Table I; Supplementary 
Dataset 2). In contrast, using a second criterion (fitnesShy- 
droiysate > 0.5 and fitnesShydroiysate > fitnesSrich + 0.75} , we 
found only one gene, ZM01496, that was detrimental for 
growth in hydrolysate, which encodes phosphoenolpyruvate 
(PEP) carboxylase (fitness in hydrolysate = 0.72 versus 
rich media = — 0.48). Because signal intensity on microarrays 
can saturate, we beheve that our pooled fitness assays have a 
reduced sensitivity for mutants with positive fitness, which 
may explain why we only identified one mutant in the positive 
fitness category. In this study, we only pursued the negative 
fitness mutants. 
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The 44 putative tolerance genes included 8 auxotrophs, 
5 genes involved in cytochrome c biogenesis or cytochrome 
c-containing proteins, 4 efflux pump-related genes, 6 
glutathione-related genes, 8 genes related to membranes or the 
cell wall, 3 regulators, and 10 other genes (Table I). The variety 
of predicted gene functions suggests that the cellular response 
to growth in hydrolysate is complex and that hydrolysate 
tolerance is a multigenic trait. This is consistent with the fact 
that plant hydrolysates are complex mixtures containing a 
variety of chemical classes (e.g., weak organic acids, furans, 
aldehydes, and phenoUcs) that likely affect cell physiology by 
different mechanisms of action. 
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To confirm the pooled fitness results, 25 transposon mutants 
were selected from our set of 44 tolerance genes for detailed 
follow-up studies (Supplementary Table 6). This group of 
strains represents examples from each of the seven functional 
classes we identified. Each mutant was re-streaked to a single 
colony isolate and their transposon insertion sites were 
verified by PGR and DNA sequencing. We then used our 
transposon stability assay to check whether the insertion was 
stable or mixed. Nine of the twenty-five mutants had mixed 
transposon insertions and were not studied further. Four of 
these nine mixed mutants were in a single operon encoding an 
efflux pump (ZM01429-ZM01432). Of the remaining 16 stable 
mutants, we confirmed negative fitness defects for 13 of them, 
when grown in batch 1 or batch 2 miscanthus hydrolysate 
(e.g., see Supplementary Figure 9). In addition, we comple- 
mented three mutants (ZMO0100::TN5, ZM01722::TN5, and 
ZMO0759::TN5) by expression of the corresponding 
wild-type gene on a plasmid, demonstrating that the observed 
phenotype is due to a single gene defect (Supplementary 
Figure 10). In sum, these data demonstrate that our 
pooled fitness assay can be used to identify bona fide 
hydrolysate tolerance genes that are critical for growth in 
plant hydrolysate. 

To further understand the specific function of each putative 
tolerance gene, we examined the fitness pattern for these 44 
genes in each of the 37 components (colored red) known to be 
present in miscanthus and switchgrass hydrolysates 
(Figure 3B}. Broadly, most of the tolerance genes had fitness 
defects in many of the individual hydrolysate components, 
which makes it difficult to infer a specific detoxification 
function for any particular tolerance gene. Some of the 
tolerance genes might respond to or detoxify a class of 
compounds, such as aldehydes, so this could explain the lack 
of a one-to-one relationship between gene fitness and 
hydrolysate component. While more detailed follow-up 
studies will be required to determine the specific biochemical 
functions of these tolerance genes, we find that hydrolysate 
tolerance genes with related functions cluster together on 
our fitness data heatmap (X axis. Figure 3B}. For example, one 
cluster contains four auxotrophs that are part of the sulfate 
assimilation pathway [cysCHIJ encoded by ZMO0003, 
ZMO0007, ZMO0008, and ZMO0009), and a second cluster 
contains cytochrome c peroxidase (CCP) (ZM01136) and two 
genes that are involved in cytochrome c biogenesis [ZM01389 
and ZM01252). Many of these fitness clusters contain 
genes that are predicted to form operons, which are also 
consistent with their shared function [ZMOOIOO-ZMOOIOI, 
ZMO0007-ZMO0009, ZMOl 8 74-ZMOl 8 75, ZMO0200- 
ZMO0201 , ZM01429-ZM01432) . 



Five hydrolysate tolerance genes iZMO0760, ZMOOlOO, 
ZMOOlOh ZMOl 722, and ZMOl 723) did not exhibit a 
fitness defect in any of the 37 inhibitors we tested. These 
hydrolysate-specific mutants might be affected by some 
unknown component of hydrolysate or only by a combination 
of inhibitors. To test the latter possibility, we made two 
synthetic hydrolysate mixtures based on the composition of 
miscanthus batch 1; one contained the 10 most abundant 
inhibitors (SYN-10; furfural, acetic acid, formic acid, levulinic 
acid, succinic acid, 5-HMF, 2-furoic acid, vanillin, vanilhc acid, 
and syringaldehyde) , and the second contained the full set of 
37 inhibitors and four sugars: glucose, xylose, arabinose, and 
cellobiose (SYN-37}. The composition of SYN-10 and SYN-37 
was verified using GC/MS and LC-RID and closely matched the 
values for batch 1 hydrolysate (Supplementary Table 1). 



Fitness profiling of synthetic hydrolysate mixtures 
in Z. mobilis and S. cerevisiae 

We first examined the effect of SYN-10 and SYN-37 on the 
growth of Z. mohilis and found that both mixtures were less 
potent than the batch 1 and batch 2 hydrolysates and inhibited 
growth in a similar manner (Supplementary Figure 11 A, 
P<10~^, analysis of variance (ANOVA)). This suggests that 
our synthetic mixtures are missing critical inhibitory compo- 
nents. To further understand this difference, we performed 
pooled fitness assays in the presence of SYN-10 or SYN-37 
(Supplementary Figure 12A; Figure 4A}. The fitness profiles 
of SYN-10 and SYN-37 were very similar (i^^ = 0.807, 
Supplementary Figure 12B}, consistent with their similar 
growth effects on Z. mobilis. However, a plot of average fitness 
in SYN-37 versus average fitness in hydrolysate shows that 
nine genes are outliers (fitnesshydroiysate < — 1 and fitnesssYN- 
37 > - 1/3, enclosed by dashed black lines in Figure 4A} . Of the 
9 genes, 5 are important for growth in hydrolysate but not in 
any of the 37 components iZMO0760, ZM01722, ZM01723, 
ZMOOlOO, and ZMOOlOl, all fitnesScomponents > - 1, leftmost 
cluster in Figure 3B} . In addition, a heatmap of the 44 tolerance 
genes shows that SYN-10 and SYN-37 are more alike each other 
than hydrolysate (Figure 4B} . In sum, our synthetic mixtures 
do not fully recapitulate the growth and fitness effects of real 
hydrolysate and the presence of outhers indicates that there 
are unidentified hydrolysate components that contribute to its 
overall toxicity. 

To determine whether synthetic mixtures can recapitulate 
the fitness profile of plant hydrolysates in other organisms, we 
chose Saccharomyces cerevisiae, for which a genome-wide 
collection of DNA-barcoded deletion strains is available 



Figure 3 Identification of 44 Z mobilis genes tliat are important for growth and 1 gene tliat is detrimental for growth in plant hydrolysate. (A) Scatter plot of gene fitness 
values in rich media (average of 24 experiments) versus gene fitness in hydrolysate (average of 37 experiments) for 1586 Z moMs genes. A dashed grey line indicates 
X= /and dashed black lines indicate the cutoffs used to select tolerance genes. Putative tolerance genes have a more negative fitness value in hydrolysate than in rich 
media and are indicated by colored symbols. They are further classified based on their predicted function, as indicated in the legend. A single gene (ZM01496), indicated 
by a black circle, was found to be detrimental for growth in plant hydrolysate. (B) Subset of average gene fitness data from Figure 2, showing only the fitness data 
for the 44 tolerance genes. The 37 hydrolysate components are indicated by red text. Arrows indicate the baseline condition without any added inhibitors (rich media), or 
rich media supplemented with DMSO (DMSO) or plant hydrolysate (hydrolysate). Gene fitness values are colored according to the color bar at the top right of the heat 
map. Each tolerance gene on the X axis is labeled by its systematic gene name (ZMOxxxx) and the clustering is colored based on predicted functional classes, as in (A). 
Tolerance genes were clustered by Euclidean distance, and conditions were clustered as in Figure 2. 
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Table I Table of 44 Z. mobilis hydrolysate tolerance genes identified in this study 

Gene Class Annotation 

ZMO0003 auxo Adenylylsulfate kinase 
ZMO0007 auxo Phosphoadenosine phosphosulfate reductase 
ZMO0008 auxo Sulfite reductase subunit beta 
ZMO0009 auxo Sulfite reductase subunit alpha 
ZMOOlOO reg Transcriptional regulator, HxlR family 
ZMOOlOl mem NAD-dependent epimerase/dehydratase 
ZMO0192 mem Peptidase M48 Ste24p 
ZMO0200 auxo Anthranilate phosphoribosyltransferase 
ZMO0201 auxo Glutamine amidotransf erase of anthranilate 
synthase 

ZMO0418 other Hypothetical protein 

ZMO0429 other Iron-sulfur cluster assembly accessory protein 
ZMO0468 auxo Anthranilate synthase component I 
ZMO0472 other rpsU-divergently transcribed protein 
ZMO0483 auxo Homoserine dehydrogenase 
ZMO0675 other Transcriptional regulator, LysR family 
ZMO0759 git Hydroxyacylglutathione hydrolase 
ZMO0760 git Glyoxalase/bleomycin resistance protein/ 
dioxygenase 

ZMO0763 other Tetratricopeptide domain protein 
ZMO0774 reg Transcriptional regulator, LysR family 
ZMO0846 mem Sodium/hydrogen exchanger 
ZMO0975 mem Conserved hypothetical membrane spanning 
protein 

ZMO1067 other Fe-S metabolism-associated SufE 
ZM01136 cytc Cytochrome c peroxidase 
ZM01221 other Hypothetical protein 
ZM01252 cytc Cytochrome c biogenesis factor-like protein 
ZM01253 cytc Cytochrome c biogenesis protein 
ZM01255 cytc Cytochrome c assembly protein 
ZM01389 cytc Cytochrome c biogenesis protein transmembrane 
region 

ZM01391 mem Diacylglycerol kinase catalytic region 
ZMO1404 reg RNA polymerase, sigma-24 subunit, ECF subfamily 
ZM01429 eff RND efflux system, outer membrane lipoprotein, 
NodT family 

ZMO1430 eff Efflux transporter, RND family, MFP subunit 
ZM01431 eff Hypothetical protein 

ZM01432 eff Fusaric acid resistance protein conserved region 
ZMO1490 mem Hypothetical protein 
ZMO1520 other Hypothetical protein 
ZM01541 other Ferrous iron transport protein B 
ZMO1590 mem ABC transporter related protein 
ZM01659 other ATP-dependent metalloprotease FtsH 
ZM01715 mem Biopolymer transport protein ExbD/TolR 
ZM01722 git S-(hydroxymethyl) glutathione dehydrogenase/ 

class III alcohol dehydrogenase 
ZM01723 git Hypothetical protein 
ZM01874 git BolA family protein 
ZM01875 git Hypothetical protein 

Systematic gene names, their functional class, and annotation are listed. 
Functional classes are auxo (auxotroph), cytc (cytochrome c), eff (efflux), git 
(glutathione related), mem (membrane), other (other), and reg (regulator). 
Annotations were obtained from RefSeq (http://www.ncbi.nlm.nih.gov/ 
RefSeq/). 



(Giaever et al, 2002). First, we profiled the S. cerevisiae 
homozygous deletion library (as a pool) in the presence of 
batch 1 hydrolysate. To identify the putative tolerance genes in 
S. cerevisiae, we searched for mutants with a significant fitness 
defect in rich media (YPD) supplemented with hydrolysate, 
but not in the rich media baseline condition. Using the same 
selection criterion as for Z. mohilis (fitnesshydroiysate < — 1 and 
fitnesShydroiysate<fitnesSrich — l}j we identified 99 yeast 
hydrolysate tolerance genes. As in Z. mobilis, the S. cerevisiae 
tolerance genes represent a variety of functional categories and 
pathways, including 12 regulatory genes, 12 amino -acid 
biosynthesis genes, 4 pentose phosphate pathway genes, 15 



membrane/secretion-related genes, and 5 oxidant-induced 
cell-cycle arrest (OCA) genes (Table II; Supplementary 
Figure 13; Supplementary Dataset 3). The overlap with 
the Z. mobilis tolerance genes was just two genes involved 
in amino-acid biosynthesis (see Discussion) . 

We then examined the effects of SYN-10 and SYN-37 on 
S. cerevisiae growth. In contrast to Z. mobilis, we found the 
primary effect of either synthetic or genuine hydrolysates was 
an increase in the length of lag phase rather than a reduction in 
growth rate. SYN-10 and SYN-37 increased the length of lag 
phase less than batch 1 or batch 2 hydrolysate did 
(Supplementary Figure IIB; P<10"^^ ANOVA). To examine 
this difference in detail, we profiled the S. cerevisiae mutant 
pool in SYN-10 and SYN-37 (Figure 4C; Supplementary 
Figure 12C}. The fitness profiles of SYN-10 and SYN-37 are 
highly similar (Figure 4D; Supplementary Figure 12D, 
1^^ = 0.860} consistent with their similar effects on lag phase 
length. Similarly to Z. mobilis, a plot of average fitness in SYN- 
37 versus average fitness in batch 1 hydrolysate uncovered 20 
outhers among our set of 99 tolerance genes (fitnesShydroiysate 
< - 1 and fitnesssYN-37> -1/3) that have significant fitness 
defects in real hydrolysate but not in SYN-37 (hsted on plot in 
Figure 4C}. 

Taken together, our results demonstrate that in both bacteria 
and yeast, synthetic hydrolysate mixtures of up to 37 
compounds do not fully recapitulate the fitness effects of real 
hydrolysate, strongly suggesting that we are missing critical 
inhibitors from our synthetic mixtures. Consistent with our 
fitness data, there are many unidentified peaks in our 
hydrolysate GC/MS chromatograms. Prioritizing which peaks 
to study further using analytical chemistry is difficult without 
additional knowledge regarding the potential contribution of 
each peak to overall hydrolysate toxicity. To address this 
problem, we used chemogenomic profiling of our 37 known 
hydrolysate components and a computational model to search 
for key missing inhibitors. 



Modeling Z. mobilis hydrolysate fitness and 
identification of methylglyoxal as a previously 
unknown hydrolysate component 

Using the Z. mobilis fitness data for the 37 compounds, we first 
modeled average gene fitness in hydrolysate as a linear 
combination of its fitness in each component. Our model 
included fitness in the baseline condition (rich media) and in 
the 16 (of 37 tested) components that significantly improved 
the fit. The full linear model, including the hst of components, 
can be found in Supplementary Table 7. Our model (Model-16) 
correlates well with fitness in hydrolysate (i^^ = 0.880, 
Figure 5A). This is as good a fit as we obtained with 
experimental fitness data from SYN-37 (i^^ = 0.810, 
Figure 4A). To test for possible synergistic effects of inhibitors, 
we also tested a model that included non-linear interactions. 
We identified three significant pairs (see Materials and 
methods): formic acid x levulinic acid (P<10~^^), 
furfural X 4-hydroxyphenylacetic acid (P<10~^^), and 
furfural x vanillin (P< 10 ~ ^) . Adding these terms to our linear 
model makes relatively little difference overall (adjusted 
rises from 0.880 to 0.893). Unlike previous growth and 
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Figure 4 Synthetic hydrolysate mixtures do not fully explain the fitness profile of real hydrolysate. Two synthetic hydrolysate mixtures containing either 37 components 
(SYN-37) or the 10 most abundant components (SYN-10) were made based on the composition of miscanthus batch 1 (Supplementary Table 1). Data for SYN-10 are 
shown in Supplementary Figure 12. (A) Scatterplot of Z mobilis gene fitness data in SYN-37 (average of 4 experiments) versus in hydrolysate (average of 
37 experiments). The 44 Z mobilis tolerance genes are color coded by category. Nine outlier genes (defined by two dashed black lines) have more negative gene fitness 
values in hydrolysate than in SYN-37, and are listed in a black box and color coded by category. (B) Heatmap of gene fitness data for the 44 Z. mobilis tolerance genes. 
Genes were clustered by Euclidean distance with complete linkage using all non-averaged fitness data. Fitness values are colored according to the color bar. 
The baseline conditions are rich media (ZRMG) and rich media supplemented with DMSO (DMSO). (C) Scatterplot of S. cerevisiae gene fitness data in SYN-37 (average 
of 6 experiments) versus in batch 1 miscanthus hydrolysate (average of 6 experiments). In all, 99 putative tolerance genes are color coded according to their function as 
indicated on the graph legend. Twenty of these genes are outliers (defined by two dashed black lines) and have more negative fitness values in hydrolysate than in 
SYN-37. Outlier genes are listed in a black box and color coded according to the legend. (D) Heatmap of gene fitness data for the 99 S. cerevisiae tolerance genes. 
Genes were clustered as in (B). Fitness values are colored according to the color bar. Two baseline conditions are shown: YPD is the rich media used for S. cerevisiae 
growth (n = 3) and ZRMG is the rich media used for Z. mobilis growth that was also used to prepare the SYN-10 and SYN-37 synthetic hydrolysate mixtures (n = 2, 
see Materials and methods). 



fermentation studies that have documented synergistic 
inhibitor combinations (Zaldivar and Ingram, 1999; Zaldivar 
et aU 1999; Oliva et aU 2004), our modeling suggests 
that fitness effects of hydrolysate are primarily additive 
(see Discussion). 

Our linear model does not fully explain the fitness profile of 
real hydrolysates. There are still several outUers where the 
fitness defect of the mutant predicted by our model is not as 
severe as observed in the real hydrolysate (Figure 5A} . Two of 
these outhers (ZMO0760-ZMO0759) form an operon that 
encodes a putative GloAB glutathione-dependent enzyme 
system, which is required for detoxification of methylglyoxal 
(or other 2-oxoaldehydes} in a wide variety of organisms 
(Ozyamak et aZ, 2010). Typically, methylglyoxal is formed 

© 2013 EMBO and Macmillan Publishers Limited 



during unbalanced metaboUsm (Freedberg et al, 1971), which 
suggests that growth on hydrolysate leads to a methylglyoxal 
stress that is detoxified by the GloAB system. However, the 
Z. mobilis genome appears to lack a methyglyoxal synthase 
gene; thus, it is not clear why Z. mohilis needs a GloAB enzyme 
system or whether significant amounts of methyglyoxal can be 
formed intracellularly during unbalanced metabolism. The 
other outhers in our model encode a predicted efflux pump 
of unknown function [ZM01429-ZM01432) and a sodium/ 
hydrogen exchanger (ZMO0846) (Figure 5A}. ZMO0846 is a 
member of the KefB superfamily and may play a role in pH 
regulation. The presence of these outliers together with gloAB 
strongly suggests that we are missing critical hydrolysate 
components. 
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Table II Table of 99 S. cerevisiae hydrolysate tolerance genes identified in this 
study 
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Table II (Continued) 



Class ORF Gene 



Other YNL246W VPS75 

Other YML097C VPS9 

Other YNL107W YAF9 

PPP YJL121C RPEl 

PPP YLR354C TALI 

PPP YPR074C TKLl 

PPP YNL241C ZWFl 

Regulatory YPL202C AFT2 

Regulatory YDR173C ARG82 

Regulatory YIR023W DAL81 

Regulatory YOL051W GALll 

Regulatory YGR163W GTR2 

Regulatory YOL012C HTZl 

Regulatory YDR162C NBP2 

Regulatory YDR195W REF2 

Regulatory YDR028C REGl 

Regulatory YDL020C RPN4 

Regulatory YHR178W STBS 

Regulatory YML007W YAPl 

Secretion YGL054C ERV14 

Secretion YML121W GTRl 

Secretion YOR070C GYPl 

Secretion YDR137W RGPl 

Secretion YLR039C RICl 

Secretion YMR183C SS02 

Secretion YOL018C TLG2 

Secretion YLR262C YPT6 



Functional class (Class), systematic gene name (ORF), and gene name (Gene) 
are listed. The six broad functional classes are AA biosynthesis (amino-acid 
biosynthesis), membrane, secretion, OCA (Oxidant-induced Cell-cycle Arrest), 
PPP (pentose phosphate pathway), and regulatory. These classes are based on 
GO annotations obtained from the Saccharomyces Genome Database (http:// 
www.yeastgenome.org). See Supplementary Dataset 2 for detailed annotations. 

To investigate this, we obtained fitness profiles in methyl- 
glyoxal and a number of additional compounds and stress 
conditions that might result from hydrolysate exposure or from 
unbalanced metabohsm (Supplementary Table 5} . For exam- 
ple, furfural is known to induce the formation of reactive 
oxygen species (ROS) (Allen etal, 2010), so we tested two types 
of oxidative stress (hydrogen peroxide and sodium hypochlor- 
ite). Glycolaldehyde was recently reported to be a new 
component of hydrolysate, so we added this to our hst of 
conditions (Jayakody et al, 2011). In total, we performed 
Z. mobilis pooled fitness assays in 11 additional conditions, 
including salt stress (KCl, NaCl), oxidative stress (hydrogen 
peroxide, sodium hypochlorite), glycerol, acetaldehyde, 
methylglyoxal, glycolaldehyde, and organic acids (acetic, 
formic, levulinic) at pH 6 to match the pH of our hydrolysate. 
The fitness data for these additional conditions are included in 
Figure 2 and 3B and Supplementary Dataset 1. Adding 
methylglyoxal to the regression significantly improved the fit 
(adjusted i^^ = 0.903, P<10"^^ ANOVA) and explained the 
two outhers in the GloAB system (Model- 17 in Figure 5B and 
Supplementary Table 7) . Adding all of the extra components, 
but removing insignificant ones, further improved the adjusted 
to 0.919 (Model-24 in Figure 5C and Supplementary 
Table 7). In Model-24, the biggest improved prediction was 
for ZMO0846, which tends to be sensitive to organic acids at 
pH 6 but not in our unbuffered organic acid experiments 
(Figure 3B), consistent with its predicted role in pH regulation. 

Our modeling results suggested that methylglyoxal might be 
present in our plant hydrolysates. We reanalyzed our batch 2 
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hydrolysate sample using a modified derivatization protocol 
and detected 41.4)ig/ml (0.56 mM) methylglyoxal (see Mate- 
rials and methods). To our knowledge, methylglyoxal has not 
been previously reported in dilute-acid plant hydrolysates; 
however, the formation of methylglyoxal from glucose 
degradation has been detected in a model buffer system, and 
after supercritical water treatment of cellulose derived from 
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red cedar (Thornalley et al 1999; Nakata et al 2006). 
To examine the impact of methylglyoxal in the context of 
hydrolysate, we measured the genome-wide fitness profile 
of SYN-10 alone or with 0.56 mM methylglyoxal added and 
compared it with the fitness profile of genuine hydrolysate. 
Although SYN-10 with methylglyoxal had a poorer fit to fitness 
in hydrolysate than SYN-10 did (i^^ = 0.600 versus 0.669), 
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Figure 6 Overexpression of ZM01875 improves ethanol productivity in tlie 
presence of miscanthus hydrolysate. Batch fermentation profile of the Z. mobilis 
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supplemented with 8% (v/v) batch 2 miscanthus hydrolysate (HZ). A control 
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shown for comparison (data taken from Figure 1 B). Data shown are the average 
of four replicates and error bars indicate standard deviation. 



on the complementation data, we then selected four 
strains for detailed fermentation studies to measure ethanol 
productivity in the presence of miscanthus batch 
hydrolysate (WT + Pbad-ZMOl 722, WT + Pbad-ZM01875, 
VJT + Pbad-ZMO0760, and WT + Pbad-ZMO0100). Small-scale 
aerobic batch fermentations were performed to determine 
specific ethanol productivity (g/l/h/ODeoo) • We found that 
overexpression of ZM01875 improved hydrolysate tolerance 
and increased specific ethanol productivity 2.4-fold (0.38 
versus 0.16g/l/h/OD6oo), whereas overexpression of 
ZMOl 722, ZMO0760, and ZMOOlOO had no significant effect 
on growth or ethanol production (Figure 6; Supplementary 
Figure 16). Glucose is fully consumed in both the wild-type 
and overexpression strains, yet the wild-type strain makes 
both less biomass and less ethanol. This suggests that the 
improvements in ethanol productivity in the ZMOl 87 5 over- 
expression strain are due to a metabolic shift resulting in the 
production of less byproducts (Amin et al, 1983; Yang et al, 
2009b}. 



the addition of methylglyoxal recapitulated the fitness defects 
of the gloAB operon [zMO07S9 and ZMO0760, compare 
arrows in Figure 5D and E). This suggests that ZMO0759 
and ZMO0760 are important for growth in real hydrolysate 
because they are directly involved in methylglyoxal detox- 
ification. Finally, we tested the effect of 0.56 mM methyl- 
glyoxal on the growth of wild-type Z. mobilis and found that 
addition of this hydrolysate-relevant concentration resulted in 
a small, but significant growth defect (Supplementary 
Figure 18, unpaired Mest, P<10~^). In sum, we used our 
modeling, fitness, and growth experiments to identify methyl- 
glyoxal as a previously unknown component of hydrolysate 
and to demonstrate that it contributes to overall hydrolysate 
toxicity in Z. mohilis. 



Rational engineering of Z. mobilis for improved 
fermentation performance 

We hypothesized that overexpression of putative tolerance 
genes in Z. mohilis might improve its growth and ethanol 
production in hydrolysate. To test this idea, we systematically 
overexpressed 21 tolerance genes (Supplementary Table 6), 
which were selected because they represented examples of the 
various functional classes we identified. These genes were 
overexpressed using an arabinose inducible P^^d promoter and 
a broad host plasmid system that we developed for this 
purpose (Supplementary Figure 14). Each tolerance gene was 
tagged with an N-terminal FLAG tag (DDDDYDK) that allowed 
us to examine relative protein levels by western blot. We first 
screened our overexpression strains for the correct molecular 
weight proteins and lack of significant protein degradation 
(Supplementary Figure 15). Based on these data, and on our 
prior verification of their negative fitness phenotypes, we 
selected 10 transposon mutant strains for complementation 
studies (Supplementary Table 6). For these experiments, we 
asked whether each P^ad expression construct was sufficient to 
complement the fitness defect of the corresponding transposon 
mutant (e.g., see Supplementary Figure 10). Based 
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Discussion 

Deconstructing a complex chemical stress 
using chemogenomic profiling 

In this study, we present a combined experimental and 
computational approach to address two challenges: (1) to 
understand how a complex chemical stress affects the growth 
and fermentation of Z. mohilis and S. cerevisiae and (2) to 
rationally engineer Z. mohilis for increased ethanol production 
in plant hydrolysate. Using chemogenomic profiling, we 
identified hydrolysate tolerance genes in Z. mohilis and 
S. cerevisiae and we used this information to rationally improve 
the fermentation performance of Z. mohilis in miscanthus 
hydrolysate. By modeling the Z. mohilis hydrolysate fitness 
data and then examining outliers in the regression, we 
identified methylglyoxal as an unknown component of 
miscanthus hydrolysate that contributes to its toxicity. 
Although we have focused on miscanthus and switchgrass 
hydrolysates prepared using dilute acid at high temperature, 
our experimental approach is generally applicable to any 
plant hydrolysate regardless of method of pretreatment and 
hydrolysis. 

To our knowledge, this study is the first use of chemoge- 
nomic profiling to deconstruct the biological response 
to a highly complex chemical mixture. Previous large-scale 
studies in yeast have used fitness profiling to understand the 
mechanism of action of single compounds (Jansen etal, 2009; 
Cokol et al, 2011). Similarly, most previous studies of 
hydrolysate inhibitors in bacteria and yeast have focused on 
single compounds or simple binary mixtures (Palmqvist et al, 
1999; Zaldivar and Ingram, 1999; Zaldivar et al, 1999, 2000; 
Klinke et al, 2003; Oliva et al, 2004). Only a few studies have 
looked at more complex inhibitor mixtures or at mixtures of 
hydrolysate fractions (Clark and Mackie, 1984; Koppram et al, 
2012). Here, we examined synthetic mixtures of up to 
37 inhibitors (SYN-37), greatly extending previous work, and 
determined that for Z. mohilis SYN-37 is a reasonable proxy 
(1^^ = 0.810} for a dilute-acid miscanthus hydrolysate. 
Although not tested, addition of methylglyoxal to the SYN-37 
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mixture might further improve this correlation. Our Model-17 
results (adjusted = 0.903} suggest that it may be possible to 
recapitulate the biological effects of hydrolysate with a mixture 
of only 17 compounds. Thus, our work provides a good 
starting point for developing new synthetic hydrolysate 
mixtures that mimic the real material and for enabling the 
rational engineering of hydrolysate tolerance. 

Because our high-throughput fitness protocols were devel- 
oped for aerobic conditions, most of the experiments in this 
study were performed in the presence of oxygen. However, we 
recognize that most industrial biofuel fermentations will likely 
be microaerobic or anaerobic; therefore, we performed 
anaerobic hydrolysate experiments in Z. mohilis to determine 
whether the tolerance genes we identified in this study can be 
used to engineer tolerance under these growth conditions. 
Using the same criterion for identifying aerobic tolerance genes 

(fitneSShydrolysate < - 1 and fitneSShydrolysate<fitneSSrich - 1), 

we identified 11 genes that are important for growth in 
anaerobic hydrolysate (Supplementary Figure 17). Four of 
these genes were found in our aerobic studies [ZMOOIOO- 
ZMO010hZMO0759, and ZMO1490]. In addition to these four 
genes, we identified seven new anaerobic tolerance genes 
[ZMOIOIS, ZMO1016, ZMO1017, ZMO1018, ZM01355, 
ZM01548, and ZM01556), which provide a basis for future 
engineering of anaerobic hydrolysate tolerance. These results 
emphasize the need to match laboratory hydrolysate tolerance 
studies with the specific growth and environmental conditions 
of an industrial-scale cellulosic biofuel process. However, our 
approach for dissecting a complex chemical stress is general, 
and by collecting fitness data for the 37 components under 
anaerobic conditions, it should be possible to model anaerobic 
hydrolysate stress. 



Mechanisms of hydrolysate tolerance in 
Z. mobilis and S. cerevisiae 

In both Z. mohilis and S. cerevisiae, we identified several 
broad categories of gene functions required for hydrolysate 
tolerance. We also identified many genes of unknown 
function, or unrelated to any previously known tolerance 
mechanism, demonstrating that our experimental approach is 
a rich source of new knowledge for understanding the 
biological response to a complex chemical stress. Of the 44 
tolerance genes we identified in Z. mohilis, only 1 (ZM01432) 
has previously been reported in the patent WO 2012/082711 Al 
(Caimi and Hitz, 2012). In this patent, they identified a point 
mutation in ZM01432 after evolving Z. mohilis for improved 
fermentation performance in hydrolysate. ZM01432 is part 
of a four gene operon [ZM01429-ZM01432) that encodes a 
predicted efflux pump. In our study, we identified transposon 
insertions in all four of these genes as sensitive to hydrolysate, 
which suggests that efflux of inhibitory compounds is an 
important mechanism for hydrolysate tolerance. Upregulation 
of efflux pump genes has recently been reported in a 
transcriptome study of E. coli growth in corn stover hydro- 
lysate (Schwalbach et aU 2012). In S. cerevisiae, only 9 of 99 
tolerance genes that we identified were previously reported in 
single inhibitor or hydrolysate tolerance studies (BAP2, ERG2, 
GTR2, LSM6, RPN4, TALI, TKLl, YAPl, and ZWFl) (Jeppsson 
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et al 2003; Gorsich et al, 2006; Kawahata et al, 2006; Ma and 
Liu, 2010; Sundstrom et al, 2010; Liu, 2011; Pereira et al, 2011; 
Sanda et al, 2011; Gao and Xia, 2012; Hueso et al, 2012). 
Broadly, there is little overlap in the genes we identified 
with previous tolerance studies in Z. mohilis, E. coli, and 
S. cerevisiae (Petersson et al, 2006; Almeida et al, 2007; Miller 
et al, 2009b, 2010; Yang et al, 2010a, b; Parawira and Tekere, 
2011; Drobna et al, 2012), which likely reflects the underlying 
genetic complexity of tolerance, the different experimental 
protocols used for tolerance gene identification, and the 
different plant feedstocks and methods used for hydrolysate 
preparation. However, despite these differences, we did 
identify genes in two pathways (oxidative stress response 
and amino-acid biosynthesis) that overlap with previous 
studies, which suggest a fundamental role for these pathways 
in hydrolysate tolerance in bacteria and yeast (Miller et al, 
2009a; Allen et al, 2010; Warner et al, 2010). In addition, 
our work in Z. mohilis has uncovered genes involved in 
sulfate assimilation and iron-sulfur (Fe-S) cluster assembly 
and repair that represent new potential gene targets for strain 
engineering. 



Oxidative stress response 

In both Z. mohilis and S. cerevisiae, we identified tolerance 
genes involved in oxidative stress response, which suggests 
that growth in hydrolysate induces the intracellular formation 
of ROS, which includes hydrogen peroxide, superoxide anion, 
and hydroxyl radicals. Although not tested, these ROS are not 
likely to be present in our plant hydrolysates due to their 
chemical instability. Furfural, which is an abundant compo- 
nent of plant hydrolysates, can induce the formation of ROS in 
S. cerevisiae, and provides a direct link between oxidative 
stress and hydrolysate toxicity (Allen et al, 2010). In our 
Z. mohilis gene fitness data, we identified CCP {ZM01136), 
and a number of genes involved in cytochrome c biogenesis 
[ZM01252, ZM01253, ZM01255, and ZM01389) that are 
important for growth in hydrolysate. CCP converts hydrogen 
peroxide to water (Mishra and Imlay, 2012), and likely has a 
direct role in hydrolysate tolerance by reducing the levels of 
this ROS. Consistent with previous studies of Z. mohilis CCP 
(Charoensuk et al, 2011), we find that mutants in ZM01136 are 
sensitive to hydrogen peroxide (average fitness in rich 
media = 0.003, average fitness in hydrogen peroxide = - 3.82, 
Supplementary Dataset 1). Previous work in E. coli identified 
three genes involved in peroxide detoxification [ahpC, tpx, and 
hep) that were important for growth in corn stover hydrolysate 
(Warner et al, 2010). In our S. cerevisiae hydrolysate fitness 
data, we identified YAPl, which is a known regulator of the 
oxidative stress response, including the response to peroxides 
(Veal et al, 2003; Drobna et al, 2012), and previously found to 
be important for growth in 5-HMF, an abundant component of 
hydrolysate (Ma and Liu, 2010). We also identified two 
transcriptional targets (GSHl and CYS3) of the YAPl regulator 
(Nisamedtinov et al, 2011), further imphcating the YAPl 
pathway. In addition, we identified five OCA genes (OCAl, 
0CA2, 0CA4, OCAS, and 0CA6) that are important for growth 
in hydrolysate, which are involved in the repair of lipids after 
oxidative damage (Ahc etal, 2001). Together, our data suggest 
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that in both bacteria and yeast, growth in hydrolysate leads to 
the formation of intracellular peroxides and that peroxide 
detoxification is an important mechanism of hydrolysate 
tolerance. 



Amino-acid biosynthesis 

We also imphcated amino-acid biosynthesis as a mechanism of 
hydrolysate tolerance in both Z. mobilis and S. cerevisiae. We 
found that both homoserine dehydrogenase [ZMO0483 or 
H0M6}, which is required for methionine biosynthesis, and 
glutamine amidotransferase (ZMO0201 or TRP3}, which is 
required for tryptophan biosynthesis, are important for growth 
in hydrolysate. In addition, we identified four tolerance genes 
involved in sulfate assimilation, which also suggests a role for 
cysteine biosynthesis in hydrolysate tolerance; although this 
pathway may play other roles in hydrolysate tolerance (see 
next section) . Our work is consistent with previous studies in 
E. coli which found that addition of cysteine and methionine to 
the growth media helped alleviate furfural, acetic acid, and 
hydrolysate toxicity (Roe et al, 2002; Miller et aZ, 2009a; Nieves 
et aU 2011 ; Sandoval et aU 2011} . In E. colU reduction of furfural 
depletes NADPH, which limits sulfate assimilation by the 
NADPH-dependent enzyme sulfite reductase encoded by cysIJ. 
Acetate appears to block methionine biosynthesis downstream 
of homocysteine, which leads to accumulation of this toxic 
intermediate (Roe et aU 2002). Our studies also suggest that 
addition of methionine and cysteine might improve hydro- 
lysate tolerance; however, our fitness experiments were 
conducted in rich media (ZRMG or YPD), which should have 
high levels of these amino acids, in contrast to previous studies 
that were conducted in minimal media (Nieves et aU 2011). 
It is not clear why we identified amino-acid biosynthesis genes 
in our rich media growth conditions, but the overlap with 
previous studies strongly suggests a fundamental role for 
amino-acid biosynthesis in hydrolysate tolerance in both 
bacteria and yeast. 



Sulfate assimilation 

Our fitness data in Z. mobilis also suggest that growth in 
hydrolysate induces ROS that lead to an increased demand for 
cysteine biosynthesis and for sulfide. We identified four 
Z. mobilis auxotrophs in the sulfate assimilation pathway 
[ZMOOOOS, ZMO0007, ZMO0008, and ZMO0009) encoding 
CysC, CysH, Cysl, and CysJ, respectively, which are needed for 
de novo cysteine biosynthesis and are important for growth in 
hydrolysate. In Salmonella typhimurium, the CysB regulon is 
induced by oxidants, such as hydrogen peroxide or mena- 
dione, and cysCIJ mutants have reduced levels of glutathione 
and induce an oxidative stress response (Turnbull and Surette, 
2010) . Similarly, growth of Z. mobilis in hydrolysate may lead 
to reduced levels of glutathione, which is formed from 
glutamate and cysteine; thus, this might explain the increased 
demand for cysteine. In addition, the sulfate assimilation 
pathway also functions to provide sulfur for assembly of Fe-S 
clusters; thus, it is likely that this pathway has multiple roles in 
hydrolysate tolerance. 
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Fe-S clusters 

Fe-S clusters are essential enzyme prosthetic groups that can 
be damaged by ROS (Djaman, 2004). A number of pathways 
exist in E. coli for the repair of Fe-S clusters that have been 
damaged by oxidative stress (Yang, 2002; Djaman, 2004; 
Bitoun et al, 2008} . We identified four tolerance genes in Z. 
mobilis [ZMO0429, ZMO1067, ZM01874, and ZM01875) that 
provide a causal link between growth in hydrolysate, 
oxidative stress, and Fe-S clusters. In Z. mobilis, ZMO0429 
and ZMO1067 encode predicted Fe-S assembly proteins, and 
ZM01874 encodes a predicted BolA family member, which has 
also been linked to Fe-S cluster assembly (Li and Outten, 
2012); however, ZM01875 has no predicted function. 
The fitness profiles of ZMO1067::TN5, ZM01874::TN5, 
and ZM01875::TN5 are similar and cluster together 
(Figure 3B}, suggesting a shared function in Fe-S cluster repair 
and hydrolysate tolerance. 

BolA family members have recently been shown to interact 
with GrxD family monothiol glutaredoxins in both E. coli and 
S. cerevisiae (Huynen et al, 2005; Koch and Nybroe, 2006; 
Rouhier et al, 2010; Shakamuri et al, 2012). These complexes 
directly bind Fe-S clusters, and are thought to function in 
both Fe-S cluster assembly and regulation of new Fe-S 
cluster synthesis (Kumanovics et al, 2008; Cameron et al, 
2011; Shakamuri et al, 2012; Willems et al, 2012). Consistent 
with this role, ZM01874 is in the same operon with a predicted 
monothiol glutaredoxin [ZM01873) and an Fe-S containing 
enzyme quinolinate synthetase iZM01871), which is involved 
in NAD biosynthesis. The ZM01874-ZM01873 complex 
may function to assemble Fe-S clusters in the quinolinate 
synthetase enzyme, or regulate its activity. 

ZM01875 encodes a gene of unknown function (DUF1476}, 
which, in this study, was used to improve ethanol productivity 
in the presence of hydrolysate. One recent study identified a 
DUF1476 protein that functions as an inhibitory subunit of the 
FqFi ATP synthase complex (Morales-Rios etal, 2010). It is not 
clear how inhibiting ATP synthase would improve hydrolysate 
tolerance, and ATP synthase is not consistently detrimental to 
fitness in hydrolysate. Instead, we propose that ZM01875 
functions together with BolA-GrxD like complexes in 
Z. mobilis (ZM01874-ZM01873) to assemble Fe-S clusters in 
the enzyme quinolinate synthetase or to regulate the de novo 
biosynthesis of new Fe-S clusters, to replace those that 
have been damaged by ROS. This is the first example, to our 
knowledge, of targeting Fe-S cluster pathways for engineering 
improved hydrolysate tolerance. 



Synergistic effects of inhibitors 

Previous studies have explored synergistic interactions 
between hydrolysate inhibitors, based on effects on growth 
and fermentation (Palmqvist et al, 1999; Zaldivar and Ingram, 
1999; Zaldivar et al, 1999, 2000). Some combinations of 
compounds inhibit growth or ethanol production more than 
one would expect from the activity of the compounds 
individually. The mechanisms behind this synergistic inhibi- 
tion of growth are not well understood. More broadly, 
synergistic inhibition on growth appears to be common but 
not predominant— a recent study found that 38 of 200 pairs of 
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antifungal drugs synergistically inhibited the growth of S. 
cerevisiae (Cokol et aU 2011). 

Here, we focused on a related but different question: do 
inhibitors have synergistic effects on a mutant's growth 
relative to wild type? We found that gene fitness on hydrolysate 
can be modeled as a linear combination of gene fitness 
on the individual compounds. Furthermore, although some 
genes were important for fitness in plant hydrolysate without 
being important for fitness on any of the known components, 
these genes were not important for fitness in synthetic 
hydrolysate mixtures, which suggests that they are important 
for resisting unidentified components. Similarly, we did not 
identify any genes with much lower fitness in the defined 
mixtures SYN-10 or SYN-37 than in any of their components 
(the biggest reductions in fitness, relative to the minimum on 
the components, were -1.1 and -0.6, respectively). Finally, we 
considered adding interaction terms of the form X x Y to our 
regression. These would be necessary if combining inhibitors 
converts some mutants' mild phenotypes into severe ones. 
Although we found interaction terms that were statistically 
significant, adding them did not lead to a notable improvement 
in the fit of the model. 



Implications for rational engineering of 
hydrolysate tolerance 

The success of our linear model for predicting hydrolysate 
fitness suggests that we can separately engineer tolerance to 
individual components, and then combine these improve- 
ments into a single strain. The successful evolution of a 
S. cerevisiae strain for improved fermentation in spruce hydro- 
lysate by adaptation to a cocktail of 12 inhibitors is consistent 
with our hypothesis that the biological response to hydrolysate 
can be understood as the combination of responses to the 
individual compounds (Koppram etaU 2012). A combinatorial 
engineering approach has already proved successful for 
improving tolerance to a binary mixture of inhibitors 
(Sommer et aU 2010). In their study, they identified three 
genes, recA, orfX, and ndpE that conferred resistance to either 
2-furoic acid or syringaldehyde. When these three genes were 
co-expressed, the combination conferred tolerance to the 
mixture of both compounds. Based on these results and our 
data, we beheve that a strain engineered for improved 
tolerance to the main chemical classes found in hydrolysate, 
such as weak organic acids, phenohcs, aldehydes, and furans, 
might be sufficient to confer tolerance to the full, complex 
chemical mixture present in real plant hydrolysates. For 
example, two well-known tolerance mechanisms, furfural 
reduction and laccase detoxification of phenohcs (Parawira 
and Tekere, 2011) could be combined into a single strain for 
further improvements in hydrolysate tolerance. In addition, 
based on our work, this strain could be further engineered 
by modifying pathways for peroxide detoxification 
and Fe-S cluster repair. It is possible that complex epistatic 
relationships exist between tolerance gene pathways, making 
combinatorial engineering more difficult (Sandoval et al, 
2012); however, to a first approximation our linear fitness 
model and previous work (Wood et aU 2012) argues against 
this possibility. 
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Conclusions 

In this study, we have shown that chemogenomic profiling and 
modeling can be used to deconstruct a complex chemical 
stress and that this information can be used to rationally 
engineer a strain for improved fermentation in plant 
hydrolysate. More broadly, our approach can be used to model 
other complex stresses, such as the array of environmental 
stresses (including product toxicity) encountered by microbes 
inside an industrial bioreactor (Gibson et aU 2007, 2008). 
Understanding how the production host deals with this 
complex environmental stress has significant imphcations 
for improving the yield and productivity of any industrial-scale 
fermentation process. 

Materials and methods 

strains, media, plasmids, and primers 

Bacterial strains, primers, and plasmids used in this study are listed in 
Supplementary Table 6. Zymomonas mobilis strain ZM4 obtained from 
ATCC (ATCC 31821) was the parent strain for our studies. E. colt strains 
TOP 10 (Invitrogen), NEB 5 -alpha (New England Biolabs), and 
WM3064 (W. Metcalf, University of Illinois at Urbana-Champaign) 
were used as needed. Z. mobilis was cultured in ZRMG, Zymomonas 
Rich Medium Glucose (25g/l glucose, lOg/1 yeast extract, and 
2g/l KH2PO4) or ZMMG, Zymomonas Minimal Medium Glucose 
(Goodman et al, 1982), and grown aerobically at 30°C. Anaerobic 
growth was performed in sealed Hungate tubes after degassing media 
with nitrogen or on ZRMG agar plates in an anaerobic chamber (Coy 
Lab Products). For Z. mobilis, plates or liquid media were supple- 
mented with 100 |ig/ml kanamycin, chloramphenicol, or spectinomy- 
cin as necessary. E. coli strains were grown in Luria-Bertani (LB) broth 
at 37°C, supplemented with 30|ig/ml kanamycin, 20 |ig/ml chloram- 
phenicol, or 50|ig/ml spectinomycin as needed. The S. cerevisiae 
homozygous barcoded deletion mutant collection (Giaever et al, 2002) 
was a gift of Ron Davis (Stanford Genome Technology Center). Yeast 
strains were grown aerobically at 30°C in Difco Yeast Extract Peptone 
Dextrose (YPD) media. 



Preparation and analysis of plant hydrolysates 

A dilute-acid pretreatment method was used to release fermentable 
sugars from plant biomass. M. giganteus (miscanthus) and P. virgatum 
(switchgrass) were grown in Illinois at four different field sites: 
Brownstown (BTN), Fairfield (FRF), Havana (HAV), and Orr (ORR). 
'Batch' hydrolysates were prepared from a mixed miscanthus sample 
that was derived from several field sites (Batch February 2009, EBI 
South 2006 Season, obtained from UIUC). After harvesting, samples 
were air dried, processed with a SM 2000 cutting mill and a 2-mm sieve 
(Retsch, Haan, Germany), and then finally ground to pass a 120-|im 
sieve screen using a SR 200 rotor beater mill (Retsch). Pretreatment 
and hydrolysis were performed in an Ethos Z microwave (Milestone 
Inc., Shelton, CT) equipped with six closed reaction vessels 
(each 100 ml in volume) . In brief, each vessel contained 5 g ground 
miscanthus and 45 g of 1 % (w/w) sulfuric acid. The temperature was 
increased in 4 min to 180°C (in 4.5 min to 200°C for batch 1 and batch 2) 
and held for 2 min. The mixture was cooled in an ice bath to 95°C, 
the vessels were opened and the contents rapidly transferred into a 
beaker and the mixture was further cooled to room temperature. The 
supernatant was collected after centrifugation (5000 x g) and adjusted 
to pH 6.0 with KOH. Samples were filter sterilized, 10 ml aliquots were 
flash frozen in a dry-ice ethanol bath, and stored at - 80°C, protected 
from light. Since the stability of compounds present in hydrolysate is 
unknown, hydrolysate aliquots used in pool and growth experiments 
were thawed and used only once. 

The supernatant was analyzed for soluble carbohydrates, organic 
acids and furans using an Agilent 1200 series liquid chromatography 
system (Agilent Technologies, Santa Clara, CA) equipped with a 
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refractive index detector and a diode-array detector. Samples were 
injected onto an Aminex HPX-87H column (Bio-Rad, Hercules, CA) 
and compounds were eluted at 50°C and a flow rate of 0.6 ml/min by a 
mobile phase consisting of 0.005 M sulfuric acid. For detection and 
quantification of methylglyoxal, 1 ml of the hydrolysate was mixed 
with 0.5 ml of a solution of 1% orthophenylenediamine in 0.5 M 
sodium phosphate (pH 6.5). The mixture was incubated at room 
temperature in the dark for 16 h and then 5 |il was injected onto a 
Zorbax SB-C18 Rapid Resolution column (Agilent Technologies) and 
eluted at 40°C by a gradient of solvent A (0.1% (v/v) formic acid) 
and solvent B (acetonitrile containing 0.1% (v/v) formic acid). 
The gradient program was: 5-50% B in lOmin, 50-90% B in 4min, 
90-5% B in 1 min, then 3min isocratic elution. Detection was at 
312 nm. A standard of methylglyoxal (40 % solution, Sigma- Aldrich, St. 
Louis, MO) was derivatized in the same way and used to confirm the 
retention time and UV spectra. GC/MS was used for analysis of 
phenolic compounds: to 1 ml of neutralized hydrolysate, 30 |il of 72% 
sulfuric acid and 20 |il of internal standard (iso-propylphenol, 1 mg/ml 
in 0.1 % sodium hydroxide) were added. This mixture was vigorously 
mixed three times each with 0.5 ml of ethyl acetate. After phase 
separation, the upper ethyl acetate layer was removed and collected. 
All ethyl acetate phases were combined and dried over sodium sulfate. 
An ahquot of the combined dried ethyl acetate phase (100 |il) was 
incubated with 50|il of N,0-bis(trimethylsilyl)trifluoroacetamide 
containing 1% trimethylchlorosilane (Sigma-Aldrich) at 70°C for 
30 min. In all, 1 |il was injected in splitless mode onto a VF5-MS 
capillary column (Varian, Palo Alto, CA). An Agilent 7890A gas 
chromatograph coupled to an Agilent 5975C single quadrupole mass 
spectrometer with the following settings was used: injector tempera- 
ture 280°C, carrier gas: helium at 1 ml/min, temperature program: 
3 min isocratic 75°C, 5°C/min to 150°C, 0.5°C/min to 160°C, 2°C/min 
to 190°C, 5°C/min to 240°C, 70°C/min to 325°C, 3 min isocratic, ions 
were detected by electron impact ionization (70 eV) in full scan mode 
m/z 35-500. Compounds were identified by matching their mass 
spectra with NIST database entries and by comparing their retention 
times with commercially available standards (Sigma-Aldrich). Peak 
areas were quantified using selected extracted ions for the compounds 
in internal standard calibration mode (m/z 193 for iso-propylphenol). 
We quantified 37 inhibitors and 4 sugars for each plant hydrolysate 
sample (Supplementary Table 1). Only batch 2 hydrolysate was 
analyzed for methylglyoxal. 



Preparation and analysis of synthetic hydrolysate 
mixtures (SYN-37 and SYN-10) 

Two synthetic hydrolysate mixtures were prepared containing the 
10 most abundant compounds (SYN-10) or the 37 most abundant 
compounds (SYN-37) present in batch 1 miscanthus hydrolysate. 
SYN-37 also contained four sugars (xylose, glucose, arabinose, and 
cellobiose) . All chemicals were purchased from Sigma-Aldrich. Each 
mixture was prepared by mixing 1 M DMSO (or Milli-Q water) stock 
solutions directly into 2 x ZRMG media, and then the final solution 
adjusted to 1 x ZRMG with water and the pH was adjusted to 6.0 with 
KOH. Liquid chemicals such as furfural were added directly to the 
mixture. SYN-37 and SYN-10 were analyzed by LC-RID and GC/MS to 
determine their compositions (Supplementary Table 1). We found that 
the concentrations of vanillin, syringaldehyde, vanillic acid, and furoic 
acid in the SYN-10 mixture were higher than expected. For SYN-37, 
we found that benzoic acid and cellobiose were higher than the values 
in batch 1 hydrolysate. Despite these differences in composition, 
the fitness profiles of SYN-37 and SYN-10 are highly correlated (in 
Z. mobilis, = 0.81; in S. cerevisiae, = 0.86), and we chose to continue 
our studies with these mixtures. In addition, this high correlation 
implies that the presence of three non-metabolizable sugars in SYN-37 
(xylose, arabinose, and cellobiose) versus SYN-10 (glucose only) 
had little effect on their genome-wide fitness profiles. Once made, 
the synthetic mixtures were flash frozen into 10 and 50 ml aliquots and 
stored at - 80°C. These stock solutions were considered as 100% by 
volume and used accordingly for both Z. mobilis and S. cerevisiae 
fitness and growth experiments. The final concentration of DMSO in 
SYN-37 was 0.16%. SYN-10 did not contain any DMSO. The 
fitness profile of ZRMG + 2 % DMSO was measured as a control for 



any possible effects of DMSO on fitness, but little effect was seen 
(e.g.. Figure 4B). 

Generation of a Z. mobilis barcoded transposon 
library and chemogenomic profiling 

Our laboratory has previously described detailed methods for 
TagModule construction, the generation of a genome-wide barcoded 
transposon library, pooled fitness assays, data normalization and 
analysis in Shewanella oneidensis MR-1 (Oh et al, 2010; Deutschbauer 
et al, 2011). The same methods were used to create a barcoded 
transposon library in Z. mobilis, and perform fitness assays, with 
a few modifications. Each mutant in our Z. mobilis pool contains a 
barcoded transposon, which is a Kan^ Tn5 containing a TagModule 
(each TagModule contains two 20 bp DNA barcodes, called UPTAG 
and DNTAG). Briefly, to build the transposon mutant collection in 
Z. mobilis, we used a mini-Tn5 delivery system based on the suicide 
plasmid pRL27 (Larsen et al, 2002). Transposons were delivered into 
Z. mobilis by plate mating and conjugation using E. coli WM3064. 
Kanamycin-resistant colonies were picked and mutants were stored at 
- 80°C in 384-well plates. Gene disruptions were mapped using a two- 
step arbitrary PGR method previously described (Deutschbauer et al, 
2011), using the primers (Round 1: pRL27_lE_revl + ARB8, ARBll, 
Round 2: U2_comp + ARB2). A custom Perl program was designed to 
track each transposon insertion and TagModule identity. Perl scripts 
are available upon request. Z. mobilis gene annotations for the 
main chromosome and five plasmids (pZZM401-pZZM405) were 
obtained from RefSeq (http://www.ncbi.nlm.nih.gov/RefSeq/) and a 
recent annotation paper (Yang et al, 2009a). Two Z. mobilis pools 
(up-pool and dn-pool; Supplementary Table 2) were designed using a 
custom Perl script that contained 7432 strains total and 6302 unique 
strains. These strains were cherry-picked from the 384-well plates 
using a Biomek FxP Liquid Handling Robot (Beckman Instruments) 
into 96-well 'pool plates' and grown to saturation in ZRMG + 100 |ig/ 
ml kanamycin. In all, 25 |il of each mutant was transferred and pooled 
together. The mixed pool was pelleted by centrifugation, and 
resuspended in fresh ZRMG media + 10% glycerol. In all, 100 [i\ pool 
aliquots were made and frozen at - 80 °C. 

For chemogenomic profiling, the up-pool and dn-pool were 
recovered by thawing 100 |il aliquots and inoculating into 10 ml of 
ZRMG. Cells were grown for 5 h, shaking at 30°C, until they reached an 
OD600 of 0.5. These cultures called 'START' were used to initiate 
pool experiments, which were done either in 10 ml or in 24-well 
plate format, starting at 0.02 ODeoo- Several concentrations of each 
condition were prescreened until a concentration that showed about 
50% inhibition was identified. A list of experiments, including 
concentrations used for each condition in the fitness data set, is found 
in Supplementary Table 5. After growing the pool for about 5-7 
generations (typically to saturation) , cell pellets were collected (called 
'END'), and the genomic DNA was isolated (Qiagen DNeasy Kit or 
Qiagen PureGene Kit) . We PGR amplified the UPTAGs from the up-pool 
and DOWNTAGs from the dn-pool as previously described 
(Deutschbauer et al, 2011). These PGR products were combined and 
hybridized to an Affymetrix 16K TAG4 array, washed, and scanned as 
previously described (Pierce et al, 2006). The abundance of each 
mutant in the 'END' sample compared with its abundance in the 
'START' sample is used to calculate a log2 fitness ratio (END/START), 
which is called 'strain fitness'. For most genes, 'gene fitness' is defined 
as the average strain fitness value for all 'good' transposon insertions 
(i.e., within the central 5-80% portion of a gene). We found that the 
strain fitness values for different insertions within the same gene were 
well correlated (for all hydrolysate experiments, n = 37, i^ = 0.85, 
mean absolute difference in fitness was only 0.27), and therefore we 
report only gene fitness values in this paper. 

Data normalization and calculation of gene fitness values was done 
using a modification of our previous method (Oh et al, 2010; 
Deutschbauer et al, 2011). First, we set the median of strain fitness 
from each scaffold (main chromosome and five plasmids) to zero. 
This corrected for differential efficiency of DNA extraction between the 
main chromosome and five plasmids. In addition, for the main 
chromosome we used a smooth estimator (loess in R) to remove a 
small effect of chromosomal position on tag abundance (this was 
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probably due to increased copy number near the origin). Finally, 
for the main chromosome, we also set the mode of the strain 
fitness distribution to zero. We did this because the median fitness 
was generally below the mode (the mode was estimated using the 
maximum of the kernel density using the density function in R) and 
because the mode for all proteins matched the mode for hypothetical 
proteins, which should be less likely to have phenotypes. Data analysis 
was done using custom R scripts (available upon request). The full 
data set of conditions tested and fitness values is found in 
Supplementary Dataset 1 . 

We found that a putative prophage region [ZMO1920-ZMO1952) 
contained 18 genes that were often detrimental for fitness (vertical 
yellow stripe in Figure 2, a little to the right of center); however, 
they were not specifically detrimental to fitness in hydrolysate and so 
we did not study them further. In our comparative genome hybridiza- 
tion data, this region appeared to have variable copy number 
(Supplementary Figure 4H). The positive fitness of these genes could 
be an artifact — if the prophage increases its copy number when the cell 
is stressed (Imae and Fukasawa, 1970), then the barcodes in these 
strains will be amplified and their fitness values will be positive, even 
though those cells have not increased in abundance. 

Modeling hydrolysate fitness using component 
fitness data 

We modeled the average gene fitness in hydrolysate as a linear mixture 
of gene fitness in ZRMG-rich media and in various stresses, 
for example, for each gene g: 

fhydrolysate (g) = C + ^rich X frich (g) + ^furfural X ffurfural (g) 
+ ^acetic acid ^ /acetic acid (^) + • • • • 

where frich is the average gene fitness in rich media across 24 
experiments and the parameters C, Prich. Pfurfurab Pacetic acid, etc., were 
computed using a standard linear regression (Im in R) to best fit the 
average gene fitness in hydrolysate across 37 experiments. We started 
from a model with 37 components and used ANOVA to identify 
components that did not significantly improve the regression. The 
results of ANOVA depend on the order of the components; components 
with the highest concentrations were included first and hence were 
more likely to be retained. We removed insignificant components and 
then built a new regression and performed a new ANOVA with fewer 
components. We repeated this procedure until all components 
were statistically significant (P<0.05 after Bonferroni correction for 
multiple testing), resulting in a model with 16 components (Model- 
16) . The regression results are shown in Supplementary Table 7. To test 
the effect of additional conditions on our model (see Supplementary 
Table 5 for full condition list), we added them and repeated the 
ANOVA test. 

We also considered adding interaction terms to test for possible 
synergistic effects of components. We considered adding all terms of 
the form XxY, where X and Y are components, and used a P- value 
cutoff of 10 ~ ^ to make up for the large number of terms tested. We 
identified 20 significant interaction terms. As with the linear regression 
with individual components, we then used ANOVA to see if these terms 
were significant when used in combination, again requiring P< 10 
The resulting model contained only three interaction terms. Adding 
these terms to Model-16 improves the adjusted from 0.880 to 0.893 
and alters the predicted fitness of four genes by 0.5 or more. The 
affected genes were ZMO0975, ZMO1430, ZM01431, and ZM01432; 
also ZM01429 is in an operon with ZMO1430-ZMO1432 and is near the 
threshold. All five of these genes were identified as having non-stable 
insertions, so we are not sure that the improvement of fit for these 
genes is biologically meaningful. 



Yeast pooled fitness experiments 

Fitness experiments were performed using a S. cerevisiae homozygous 
deletion pool. For each pool experiment, 100 |il of our frozen pool 
ahquot was diluted into 50 ml YPD media, and grown for 6h to an 
ODeoo of 2.4. Once the pool had recovered, we collected a 'START' 
sample, and then inoculated into various experimental conditions at a 



starting OD600 of 0.03. Pool experiments were performed in 24-well or 
48-well formats and grown in a TECAN Infinite F200 plate reader. 
Samples were grown in the condition of interest for about seven 
generations (typically to saturation) and then the 'END' sample was 
collected. Either 1 ml (24-well format) or 2.1 ml (48-well format) was 
collected for genomic DNA isolation (YeaStar Genomic DNA Kit, Zymo 
Research). DNA barcode amplification by PGR, Affymetrix hybridiza- 
tion, washing, and scanning of TAG4 arrays was done as previously 
described (Pierce et al, 2006). Using custom R scripts, we set the 
median fitness value, as computed using the UPTAG or DNTAG, to zero 
and then averaged across the measurements for each gene 
(Supplementary Dataset 1). 'Gene fitness' and 'strain fitness' values 
are the same (log2 ratio of END/START) for S. cerevisiae because only 
one deletion strain exists per gene in the homozygous pool. For most 
yeast strains, we have both UPTAG and DNTAG measurements of 
abundance. In addition, the Affymetrix TAG4 array has five replicate 
probe spots for each UPTAG and DNTAG; therefore, 'gene fitness' is an 
average of multiple measurements (i.e., two different tags and probe 
replicates). To identify putative S. cerevisiae tolerance genes, we 
plotted the gene fitness data in YPD-rich media versus gene fitness in 
YPD-rich media supplemented with batch 1 hydrolysate, and 
selected genes using the following criterion: fitnesshydroiysate < - 1 
and fitnesShydroiysate<fitoesSrich -1 (Supplementary Figure 13; 
Supplementary Dataset 2). Gene and GO annotations for S288C were 
obtained from the SGD database (http://www.yeastgenome.org/). 
We used GO annotations to help classify our tolerance gene list into six 
broad functional groups (Table II). 



Identification of promoters for use in Z. mobilis 

We developed a series of Gateway (Invitrogen) adapted vectors for 
overexpression of genes in Z. mobilis. The system is based on pJS71, a 
broad-host plasmid derived from pBBRlMCS (Skerker and Shapiro, 
2000) that we found to be stably maintained in Z. mobilis. Using a 
fluorescence-based assay based on superfolder GFP (Pedelacq et al, 
2005), we tested the relative promoter strengths of one E. coli (P^ad) 
and oneZ. mobilis (Pgap) promoter (Conway ef a/, 1987; Guzman a/, 
1995). Each promoter construct was made using the Gateway system 
by fusing the promoter of interest to GFP using a pJS 71 -based 
destination vector (Supplementary Table 6) . These strains were grown 
in a 10-ml ZRMG media culture with spectinomycin. In the case of Pbad^ 
cultures were induced with increasing concentrations of arabinose. 
Once saturated, cultures were diluted 1 : 1000 and allowed to grow until 
they reached an ODeoo of 0.4. Cells were washed twice with 1 x PBS. 
After washing, 150 |il of resuspended cells was pipetted into a 96-well 
assay plate (Costar 3603). In addition, a Z. mobilis control strain 
carrying only the pJS71 plasmid was used as a negative control. 
Fluorescence (RFU) was measured in a Tecan Safire plate reader. 
Average fluorescence of the control strain was subtracted from the 
average of each experimental sample. Relative promoter activity was 
determined by calculating RFU/OD for all samples and averaged over 
three biological replicates (Supplementary Figure 14). 



Z. mobilis growth experiments 

Growth experiments were performed in 24-well and 96-well micro- 
plates. Single transposon mutant characterization was performed in 
150 |il volume, 96-well format and in 1ml volume, 24-well format 
using either a Tecan Sunrise or Tecan Infinite F200 plate reader (Tecan 
Systems Inc., San Jose, CA). Starting ODeoo for all Z. mobilis strains 
was 0.02. For Z. mobilis transposon mutants, saturated overnight 
ZRMG cultures were diluted 1 :20, grown to ODsoo 0-5 and then used to 
inoculate a plate containing 1 x ZRMG with and without 10 % batch 1 
or 8% batch 2 miscanthus hydrolysate (% v/v). Complementation and 
overexpression experiments were done using a Tecan Sunrise reader in 
a 96-well format. Strains were grown overnight in ZRMG + 100 |ig/ml 
spectinomycin. For mutants where Ptad controlled expression of the 
relevant gene, 2 % arabinose was added to the media. Once saturated, 
cultures were diluted back 1:20 and grown to 0.5 ODsoo- Strains were 
inoculated at a starting ODsoo of 0.1 into plates containing ZRMG, with 
and without inhibitor. For these experiments, inhibitor concentrations 
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were 10% batch 1 hydrolysate or 1.5 mM methylglyoxal in 150 \x\ total 
volume per well. All growth experiments were run for up to 72 h. 

Construction of expression clones for 
complementation and overexpression 

We used the Ptad and Pgap promoters for complementation of tolerance 
gene mutants and for testing the effect of overexpression on growth 
and fermentation performance. Each promoter construct included an 
in-frame N-terminal FLAG tag (DDDDYDK), so protein expression 
could be detected by western blot with anti-FLAG antibodies 
(Sigma- Aldrich) . A series of pENTR clones were generated for putative 
Z. mobilis tolerance genes (Supplementary Table 6). We used 
Gateway cloning to generate Fbad-ZMOxxxx or Fgap-ZMOxxxx expres- 
sion clones, where ZMOxxxx is the systematic gene name 
(Supplementary Table 6). These plasmids were electroporated into 
wild-type Z. mobilis to examine the effects of overexpression on cell 
growth and ethanol production. For electroporation, competent cells 
were made from an overnight wild-type culture grown to 0.8-1.0 
ODeoo- Cells were harvested by centrifugation and resuspended twice 
in ice-cold sterile Milli-Q water (EMD Millipore, Billerica, MA) . Cells 
were then washed twice with ice-cold 10% glycerol and frozen in 80 |il 
aliquots at - 80 °C. To perform the electroporation, 40 |il of cells was 
thawed on ice and 1 |ig of DNA was added, along with 2.5 |ig of Type 
One Restriction Inhibitor (Epicentre) . The transformation mixture was 
pipetted into an electroporation cuvette and cells were electroporated 
using the following settings: 1600 V, 200 ohms, and 25 jiF. Cells were 
recovered in ZRMG media for 6h, shaking at 30 °C. After recovery, 
10 |il or 100 |il of recovered cells was spread on selective plates and 
incubated at 30 °C for at least 2 days. A single colony for each 
overexpression strain was archived at -80°C (Supplementary 
Table 6). The same constructs were also used for complementation 
studies, except that they were electroporated into the corresponding 
mutant [ZMOxxxx: :TN 5) strain. Overexpression and complementation 
experiments were performed with at least three biological replicates, 
using freshly streaked single colonies from ZRMG + spectinomycin 
plates. 



Western blot analysis of overexpression strains 

The relative expression of each clone used for overexpression was 
examined by western blot analysis (Supplementary Figure 15). Strains 
containing expression plasmids were grown in ZRMG + 100 |ig/ml 
spectinomycin + 2 % arabinose. After 24 h, cultures were diluted 1 :20, 
grown to 0.5 ODeoo. and 1 ml was harvested. Cells were resuspended in 
freshly made 1 x SDS-PAGE sample buffer (Bio-Rad) containing 
P-mercaptoethanol and boiled for 5 min. All samples were stored at 
- 80°C until used. Protein samples were separated by SDS-PAGE 
(using precast 2-40% gradient gels, Bio-Rad), transferred at 4°C onto a 
PVDF membrane and then blocked overnight in PBST + 5 % non-fat 
instant dry milk. Once blocked, the membrane was washed briefly in 
PBS + 3 % non-fat milk and subsequently incubated with the primary 
antibody, 1:5000 dilution of anti-FLAG (Sigma F3165), for 1 h, shaking 
at room temperature. After thoroughly washing with PBST, the 
membrane was incubated with the secondary antibody, 1:1000 goat 
anti-mouse horseradish peroxidase (Thermo Scientific), again for 1 h 
with shaking. The membrane was washed with PBST and incubated 
with ECL reagent (1:1 mixture, Thermo Scientific) for 5 min. 
After allowing to drip dry briefly, the blot was imaged using a Fujifilm 
LAS-4000 imager in chemiluminescence mode. 



Fermentation experiments 

The fermentation capability of mutants was measured during aerobic 
growth in rich media with and without hydrolysate (Figures 1 and 6; 
Supplementary Figure 16). Starter cultures were grown in 1 x 
ZRMG + 100 |ig/ml spectinomycin and induced with 2 % arabinose. 
After 24 h, cultures were diluted 1:20 and grown to 0.5 ODgoo- 
Fermentations were started at 0.1 ODeoo and set up in 10 ml tubes, 
shaking 200 r.p.m. at 30°C. Fermentation media contained 1 x ZRMG, 



2% arabinose, and spectinomycin for control conditions. To test the 
effects of hydrolysate on fermentation, the media was supplemented 
with 8 % (v/v) batch 2 hydrolysate. Samples were collected every 3 h 
for 50 h. At each time point, ODeoo was measured and supernatant was 
removed for HPLC analysis of glucose and ethanol concentrations. 
Vials were stored at 4°C until processed. For HPLC analysis of the 
supernatant, samples were analyzed at 55 °C on a Rezex RFQ fast acid 
column (Phenomenex, Torrance, CA) and compounds were 
eluted with 0.005 M sulfuric acid at a flow rate of 1 ml/min. Ethanol 
and glucose were detected by RID. Specific ethanol productivity 
(g/l/h/ODgoo) was calculated during the time period {_ti to ^2) when 
glucose was being consumed, using the formula (ethanolf2 - 
ethanola)/(t2-fi)/(OD6ooa + OD6oof2)/2). The following intervals 
were used: WT + pJS71 (0-15h), WT + pJS71 in hydrolysate 
(0-2 7 h), WT + Pbad-ZM01875 in hydrolysate (0-18h). Fermentation 
experiments for WT + pJS71 and WT + Ptad-ZMO 187 5 were 
performed using four biological replicates and average 
productivity values are reported. Fermentation experiments using 
WT + Vtad-ZM01722, WT + Ptad-ZMO07 60, and WT + P^^d" 
ZMOOIOO) were performed in duplicate. 



Supplementary information 

Supplementary information is available at the Molecular Systems 
Biology website (www.nature.com/msb). 
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